NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Disparate Effect Of Missing Mediators On Transportability of Causal Effects

Mhasawade, Vishwali; Chunara, Rumi (May 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available May 24, 2026
Utilizing big data without domain knowledge impacts public health decision-making

https://doi.org/10.1073/pnas.2402387121

Zhang, Miao; Rahman, Salman; Mhasawade, Vishwali; Chunara, Rumi (September 2024, Proceedings of the National Academy of Sciences)

New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates.
more » « less
Full Text Available
Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database

https://doi.org/10.1371/journal.pdig.0000023

Singh, Harvineet; Mhasawade, Vishwali; Chunara, Rumi (April 2022, PLOS Digital Health)
Pollard, Tom J. (Ed.)
Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.
more » « less
Full Text Available
Causal Multi-level Fairness

https://doi.org/10.1145/3461702.3462587

Mhasawade, Vishwali; Chunara, Rumi (July 2021, AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society)

Full Text Available
Machine learning and algorithmic fairness in public and population health

https://doi.org/10.1038/s42256-021-00373-4

Mhasawade, Vishwali; Zhao, Yuan; Chunara, Rumi (August 2021, Nature Machine Intelligence)

Full Text Available
Fairness Violations and Mitigation under Covariate Shift

https://doi.org/10.1145/3442188.3445865

Singh, Harvineet; Singh, Rina; Mhasawade, Vishwali; Chunara, Rumi (March 2021, FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency)
null (Ed.)
Full Text Available
Population-aware hierarchical bayesian domain adaptation via multi-component invariant learning

https://doi.org/10.1145/3368555.3384451

Mhasawade, Vishwali; Rehman, Nabeel Abdur; Chunara, Rumi (April 2020, ACM Conference on Health, Inference and Learning)

Full Text Available
Role of the Built and Online Social Environments on Expression of Dining on Instagram

https://doi.org/10.3390/ijerph17030735

Mhasawade, Vishwali; Elghafari, Anas; Duncan, Dustin T.; Chunara, Rumi (February 2020, International Journal of Environmental Research and Public Health)

Online social communities are becoming windows for learning more about the health of populations, through information about our health-related behaviors and outcomes from daily life. At the same time, just as public health data and theory has shown that aspects of the built environment can affect our health-related behaviors and outcomes, it is also possible that online social environments (e.g., posts and other attributes of our online social networks) can also shape facets of our life. Given the important role of the online environment in public health research and implications, factors which contribute to the generation of such data must be well understood. Here we study the role of the built and online social environments in the expression of dining on Instagram in Abu Dhabi; a ubiquitous social media platform, city with a vibrant dining culture, and a topic (food posts) which has been studied in relation to public health outcomes. Our study uses available data on user Instagram profiles and their Instagram networks, as well as the local food environment measured through the dining types (e.g., casual dining restaurants, food court restaurants, lounges etc.) by neighborhood. We find evidence that factors of the online social environment (profiles that post about dining versus profiles that do not post about dining) have different influences on the relationship between a user’s built environment and the social dining expression, with effects also varying by dining types in the environment and time of day. We examine the mechanism of the relationships via moderation and mediation analyses. Overall, this study provides evidence that the interplay of online and built environments depend on attributes of said environments and can also vary by time of day. We discuss implications of this synergy for precisely-targeting public health interventions, as well as on using online data for public health research.
more » « less
Full Text Available

Search for: All records